Floating Point Fault Tolerance with Backward Error Assertions
نویسندگان
چکیده
This paper introduces an assertion scheme based on the brpwprd errw amlysis for error detection in algorithms that solve dense systems of linear equations, A z = b. Unlike previous methods, this Backward Error Assertion Model is specifically designed to operate in an environment of floating point arithmetic subject to round-off errors, and it can be easily instrumented in a Watchdog processor envjronment. The complexity of verifying assertions is O(n2) , compared to the O(n3) complexity of algorithms solving A z = b. Unlike other proposed error detection methods, this assertion model does not require any encoding of the matrix A. Experimental results under various error models are presented to validate the effectiveness of this assertion scheme.
منابع مشابه
- - - - an Application - Oriented Approach to Distributed Error - Detecting Branch & Bound †
An important aspect which is often overlooked in software design of distributed environments is that of fault tolerance. Many methodologies in the past have attempted to provide fault tolerance efficiently, but have never been successful at eliminating explicit time and space redundancy. One approach is the Application-Oriented Fault Tolerance Paradigm, which provides fault tolerance by examini...
متن کاملTeraflops Supercomputer: Architecture and Validation of the Fault Tolerance Mechanisms
ÐIntel Corporation developed the Teraflops supercomputer for the US Department of Energy (DOE) as part of the Accelerated Strategic Computing Initiative (ASCI). This is the most powerful computing machine available today, performing over two trillion floating point operations per second with the aid of more than 9,000 Intel processors. The Teraflops machine employs complex hardware and software...
متن کاملConcurrent Error-Detection and Modular Fault-tolerance in a 32-bit Processing Core for Embedded Space Flight Applications
This paper describes the concurrent error-detection methods employed in the ERC32, a 32-bit processing core for embedded space flight applications. The processor core consists of three devices; an integer unit, a floating point unit and a memory controller. All three devices are provided with internal concurrent error-detection, mainly to detect transient errors. Over 98% of all latched errors ...
متن کاملAspect Oriented Software Fault Tolerance
Software fault tolerance demands additional tasks like error detection and recovery through executable assertions, exception handling, diversity and redundancy based mechanisms. These mechanisms do not come for free, rather they introduce additional complexity to the core functionality. This paper presents light weight error detection and recovery mechanisms based on the rate of change in signa...
متن کاملCOFTA: Hardware-Software Co-Synthesis of Heterogeneous Distributed Embedded Systems
Embedded systems employed in critical applications demand high reliability and availability in addition to high performance. Hardware-software co-synthesis of an embedded system is the process of partitioning, mapping, and scheduling its specification into hardware and software modules to meet performance, cost, reliability, and availability goals. In this paper, we address the problem of hardw...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Computers
دوره 44 شماره
صفحات -
تاریخ انتشار 1995